American Journal of Epidemiology
◐ Oxford University Press (OUP)
All preprints, ranked by how well they match American Journal of Epidemiology's content profile, based on 57 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Hernandez, M. A.; Li, Z.; Cole, T.; Ong, Y. Y.; Tilling, K.; Elhakeem, A.
Show abstract
Investigating early life growth dynamics is crucial for a more comprehensive understanding of the developmental origins of obesity. Spline methods based on basis splines (B-splines) provide excellent flexibility for modelling complex nonlinear growth patterns, but they are prone to overfitting. To ensure good fit and avoid overfitting, B-splines can be extended by adding a penalty term to control their flexibility, resulting in what are commonly known as penalized B-splines (P-splines). Despite their strengths, P-splines are not yet widely used in epidemiology, partly due to a lack of practical guidance. This paper provides an illustrative guide to using P-spline linear mixed effects models to examine early life growth trajectories and estimate key growth features in longitudinal studies. After detailing P-spline theory and model fitting, we apply the method to repeated measurements of height, weight, and body mass index (BMI) up to age 10 years in a Southeast Asian birth cohort. We estimated infant growth velocity, and magnitude and timing of infant peak BMI and childhood rebound BMI, and explored sex differences, intercorrelations, and associations with prenatal factors. In our cohort, infant peak growth velocity was higher in boys than girls, ages of peak and rebound BMI had a negligible correlation, and greater birth length was associated with lower infant height velocity and higher weight velocity. We discuss practical considerations, alternative modelling approaches and provide recommendations for research. P-splines simplify the knot selection process, making them a valuable approach for growth modelling. R library, code and datasets are provided to accelerate uptake.
Yang, F. N.; Duyn, J. H.; Xie, W.
Show abstract
Understanding health differences among racial groups in child development is crucial for addressing inequalities that may affect various aspects of a childs life. However, factors such as household and neighborhood socioeconomic status (SES) often covary with health differences between races, making it challenging to accurately reveal these differences using conventional covariate-control methods such as multiple regression. Alternative methods, such as Propensity Score Matching (PSM), may provide better covariate control. Supporting this notion, we found that PSM is more sensitive than regression-based methods in detecting health differences between self-reported Black and White children across a wide range of behavioral and neural measurements in the ABCD (5636 White, 1350 Black). Puberty status, an index of physical maturation, emerged as the largest difference between races and mediated the health differences between races on the majority of behavioral and neural variables. These findings highlight the importance of controlling for pubertal status and using more effective covariate-control methods to accurately represent health differences between Black and White children.
Bather, J. R.; Anyaso-Samuel, S.; Chen, Y.; Elliott, L.; Bennett, A. S.; Goodman, M. S.
Show abstract
Variation in binary outcomes over time by cluster size arises across various biomedical disciplines, including reproductive health, dental medicine, and psychiatric epidemiology. This study formally integrates modified Poisson regression with cluster-weighted generalized estimating equations (MP-CWGEE) for computing risk ratios in longitudinal studies with informative cluster sizes. Using a comprehensive Monte-Carlo simulation study, we empirically evaluated MP-CWGEEs statistical properties against alternative modeling approaches: MP-GEE, log-binomial CWGEE (LB-CWGEE), and log-binomial GEE (LB-GEE). We conducted 1,000 simulations across varying sample sizes, risk ratios, and informativeness degrees. MP-CWGEE demonstrated superior performance in model convergence, empirical bias, average estimated standard error, coverage, and Type 1 error control. While LB-CWGEE showed comparable results, its convergence rates were slightly inferior. The benefits of cluster-weighted models (MP-CWGEE and LB-CWGEE) over unweighted models (MP-GEE and LB-GEE) were pronounced in scenarios with informative cluster sizes. We demonstrated MP-CWGEEs practical application to a cohort study of people who used illicit opioids in New York City. We also provided implementation code for R, Stata, and SAS to facilitate wider adoption of the MP-CWGEE approach.
Hegde, S.; Eisenberg, J. N.; Beesley, L. J.; Mukherjee, B.
Show abstract
Epidemiologic data often violate common modeling assumptions of independence between subjects due to study design. Statistical separation is also common, particularly in the study of rare binary outcomes. Statistical separation for binary outcomes occurs when regions of the covariate space have no variation in the outcome, and separation can negatively impact the validity of logistic regression model parameters. When data are correlated, we generally use multi-level modeling for parameter estimation, and statistical approached have also been developed for handling statistical separation. Approaches for analyzing data with both separation and complex correlation, however, are not well-known. Extending prior work, we demonstrate a two-stage Bayesian modeling approach to account for both separated and highly correlated data through a motivating example examining the effect of social ties on Acute Gastrointestinal Illness (AGI) in rural Ecuador. The two-stage approach involves fitting a Bayesian hierarchical model to account for correlation using priors derived from parameter estimates from a Firth-corrected logistic regression model to account for separation. We compare estimates from the two-stage approach to standard regression methods that only account for either separation or correlation. Our results demonstrate that correctly accounting for separation and correlation when both are present can potentially provide better inference.
Collin, L. J.; MacLehose, R. F.; Ahern, T. P.; Goodman, M.; Lash, T. L.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWO_ST_ABSBackgroundC_ST_ABSAn internal validation substudy compares an imperfect measurement of a variable with a gold standard measurement in a subset of the study population. Validation data permit calculation of a bias-adjusted estimate, expected to equal the association that would have been observed had the gold standard measurement been available for the entire study population. Guidance on optimal sampling of participants to include in validation substudies has not considered monitoring validation data as they accrue. In this paper, we develop and apply the framework of Bayesian monitoring to determine when sufficient validation data have been collected to yield a bias-adjusted estimate of association with a prespecified level of precision. MethodsWe demonstrate the utility of this method using the Study of Transition, Outcomes and Gender--a cohort study of transgender and gender non-conforming children and adolescents. Transmasculine and transfeminine status were determined from the gender code in the electronic medical record at cohort enrollment. This status is known to be misclassified because it can indicate either gender identity or sex recorded at birth. Our interest is in the association between transmasculine and transfeminine status and self-inflicted injury. To address possible exposure misclassification, we demonstrate the methods ability to determine when sufficient validation data have been collected to calculate a bias-adjusted estimate of association that is less than 80% greater than the precision of the conventional estimate. ResultsIn the conventional age-adjusted analysis, we observed that transmasculine children and adolescents were 1.80-fold more likely to inflict self-harm than transfeminine youths (95%CI 1.27, 2.55). Using the adaptive validation approach, 200 cohort members were required for validation to yield a bias-adjusted estimate of OR=3.03 (95%CI 1.76, 5.56), which was similar to the bias-adjusted estimate using complete validation data (OR=2.63, 95%CI 1.67, 4.23). ConclusionsOur method provides a novel approach to effective and efficient estimation of classification parameters as validation data accrue. This method can be applied within the context of any parent epidemiologic study design, and modified to meet alternative criteria given specific study or validation study objectives.
Power, G. M.; Sanderson, E.; Davey Smith, G.; Hemani, G.
Show abstract
BackgroundHigher adiposity in early life has consistently been associated with a reduced risk of breast cancer in later life, with Mendelian randomization (MR) studies supporting a potential causal effect. However, concerns have been raised that selection bias, particularly collider stratification due to selective participation or survival, may induce spurious protective effects. MethodsWe used a triangulation framework combining empirical analyses and simulations to evaluate whether selection-induced bias could plausibly explain the inverse effect of early life body size on breast cancer risk. First, we re-examined proxy-genotype MR analyses and conducted family-based simulations to assess whether attenuation in relative-based estimates could arise without selection bias. Second, we performed multivariable MR analyses of parental survival to evaluate survival-related selection mechanisms. Third, we conducted extensive simulations under a null causal model to quantify the magnitude of bias introduced by selection under a range of plausible and extreme scenarios, including interaction-driven selection. ResultsAttenuation in proxy-genotype MR estimates was reproduced in simulations without selection bias, indicating that this pattern does not provide evidence for selection bias. Multivariable MR analyses of parental survival indicated that survival differences are primarily driven by adulthood, not childhood, adiposity, providing little support for survival-related selection acting through childhood body size. In simulation analyses, additive selection produced minimal bias, while interaction-driven selection generated increasing distortion; however, even under extreme scenarios, the magnitude of bias was insufficient to replicate the observed protective effect. When selection operated through adulthood body size, bias was confined largely to adulthood estimates. Across all scenarios, the joint pattern of univariable and multivariable MR findings was not reproduced under selection alone. ConclusionsAlthough selection bias can influence MR estimates, our findings suggest that plausible selection mechanisms are unlikely to fully explain the observed inverse effect of early life adiposity on breast cancer risk. These results support a causal interpretation of the protective effect and highlight the value of triangulating evidence across complementary approaches when evaluating bias in lifecourse MR.
Chen, D.; Shioda, K.; Brouwer, A.; Kraay, A.; Handel, A.; Lopman, B.; McQuade, E. R.; Nelson, K.
Show abstract
BackgroundThe estimate of diarrhea burden attributed to a specific enteric pathogen--the population attributable fraction (PAF)--depends on the specific calculation method. Two conventional methods are commonly used to estimate the PAF for enteric infections: the "detection-as-etiology" (DE) method, which defines the PAF as the pathogen prevalence in diarrheal cases; and the "odds-ratio" (OR) method, which expresses the PAF as a function of the OR between pathogen detection and diarrhea. A third, less frequently used method uses the risk ratio (RR) to quantify the strength of infection. MethodsWe compared each conventional PAF (DE, OR, or RR PAF) to a model-based (MB) PAF, derived from a transmission model of enteric infection, and defined bias as the crude difference from this "true" MB PAF. We fitted the transmission model to site-specific qPCR data for norovirus and rotavirus detection from MAL-ED (an eight-country birth cohort studying enteric infections) and used the equilibrium states to calculate the MB PAF. ResultsFor both pathogens, the OR and RR biases were small at all sites (ranging from -5% to +3%), whereas the DE method consistently overestimated the PAF and its bias was the largest of the conventional methods. ConclusionsOur mechanistic model provides an independent alternative to conventional methods, quantifying pathogens-specific enteric burden and the biases in those methods. Our model suggests the DE PAF estimations are consistently biased, and validates the OR and RR methods as feasible, low-bias measures for quantifying enteric burden.
Salvatore, M.; Kundu, R.; Du, J.; Friese, C. R.; Mondul, A. M.; Hanauer, D. A.; Lu, H.; Pearce, C. L.; Mukherjee, B.
Show abstract
Electronic health records (EHRs) are valuable for public health and clinical research but are prone to many sources of bias, including missing data and non-probability selection. Missing data in EHRs is complex due to potential non-recording, fragmentation, or clinically informative absences. This study explores whether polygenic risk score (PRS)-informed multiple imputation for missing traits, combined with sample weighting, can mitigate missing data and selection biases in estimating disease-exposure associations. Simulations were conducted for missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) conditions under different sampling mechanisms. PRS-informed multiple imputation showed generally lower bias, particularly when combined with sample weighting. For example, in biased samples of 10,000 with exposure and outcome MAR data, PRS-informed imputation had lower percent bias (3.8%) and better coverage rate (0.883) compared to PRS-uninformed (4.5%; 0.877) and complete case analyses (10.3%; 0.784) in covariate-adjusted, weighted, multiple imputation scenarios. In a case study using Michigan Genomics Initiative (n=50,026) data, PRS-informed imputation aligned more closely with a sample-weighted All of Us-derived benchmark than analyses ignoring missing data and selection bias. Researchers should consider leveraging genetic data and sample weighting to address biases from missing data and non-probability sampling in biobanks.
Oh, E. J.; Mikytuck, A.; Lancaster, V.; Goldstein, J.; Keller, S.
Show abstract
Understanding the prevalence of infections in the population of interest is critical for making data-driven public health responses to infectious disease outbreaks. Accurate prevalence estimates, however, can be difficult to calculate due to a combination of low population prevalence, imperfect diagnostic tests, and limited testing resources. In addition, strategies based on convenience samples that target only symptomatic or high-risk individuals will yield biased estimates of the population prevalence. We present Bayesian multilevel regression and poststratification models that incorporate probability sampling designs, the sensitivity and specificity of a diagnostic test, and specimen pooling to obtain unbiased prevalence estimates. These models easily incorporate all available prior information and can yield reasonable inferences even with very low base rates and limited testing resources. We examine the performance of these models with an extensive numerical study that varies the sampling design, sample size, true prevalence, and pool size. We also demonstrate the relative robustness of the models to key prior distribution assumptions via sensitivity analyses.
Wang, Y.; Pitre, T.; Wallach, J. D.; de Souza, R. J.; Jassal, T.; Bier, D.; Patel, C. J.; Zeraatkar, D.
Show abstract
ObjectiveTo present an application of specification curve analysis--a novel analytic method that involves defining and implementing all plausible and valid analytic approaches for addressing a research question--to nutritional epidemiology. Data sourceNational Health and Nutrition Examination Survey (NHANES) 2007 to 2014 linked with National Death Index. MethodsWe reviewed all observational studies addressing the effect of red meat on all-cause mortality, sourced from a published systematic review, and documented variations in analytic methods (e.g., choice of model, covariates, etc.). We enumerated all defensible combinations of analytic choices to produce a comprehensive list of all the ways in which the data may reasonably be analyzed. We applied specification curve analysis to NHANES data to investigate the effect of unprocessed red meat on all-cause mortality, using all reasonable analytic specifications. ResultsAmong 15 publications reporting on 24 cohorts included in the systematic review on red meat and all-cause mortality, we identified 70 unique analytic methods, each including different analytic models, covariates, and operationalizations of red meat (e.g., continuous vs. quantiles). We applied specification curve analysis to NHANES, including 10,661 participants. Our specification curve analysis included 1,208 unique analytic specifications. Of 1,208 specifications, 435 (36.0%) yielded a hazard ratio equal to or above 1 for the effect of red meat on all-cause mortality and 773 (64.0%) below 1, with a median hazard ratio of 0.94 [IQR: 0.83 to 1.05]. Forty-eight specifications (3.97%) were statistically significant, 40 of which indicated unprocessed red meat to reduce all-cause mortality and 8 of which indicated red meat to increase mortality. ConclusionWe show that the application of specification curve analysis to nutritional epidemiology is feasible and presents an innovative solution to analytic flexibility. LimitationsAlternative analytic specifications may address slightly different questions and investigators may disagree about justifiable analytic approaches. Further, specification curve analysis is time and resource-intensive and may not always be feasible.
Arinaminpathy, N.; Reed, C.; Biggerstaff, M.; Nguyen, A.; Athni, T. S.; Arnold, B. F.; Hubbard, A. E.; Colford, J. M.; Reingold, A.; BENJAMIN-CHUNG, J.
Show abstract
BackgroundMathematical models and empirical epidemiologic studies (e.g., randomized and observational studies) are complementary tools but may produce conflicting results for a given research question. We used sensitivity analyses and bias analyses to explore such discrepancies in a study of the indirect effects of influenza vaccination. MethodsWe fit an age-structured, deterministic, compartmental model to estimate indirect effects of a school-based influenza vaccination program in California that was evaluated in a previous matched cohort study. To understand discrepancies in their results, we used 1) a model with constrained parameters such that projections matched the cohort study; and 2) probabilistic bias analyses to identify potential biases (e.g., outcome misclassification due to incomplete influenza testing) that, if corrected, would align the empirical results with the mathematical model. ResultsThe indirect effect estimate (% reduction in influenza hospitalization among older adults in intervention vs. control) was 22.3% (95% CI 7.6% - 37.1%) in the cohort study but only 1.6% (95% Bayesian credible intervals 0.4 - 4.4%) in the mathematical model. When constrained, mathematical models aligned with the cohort study when there was substantially lower pre-existing immunity among school-age children and older adults. Conversely, empirical estimates corrected for potential bias aligned with mathematical model estimates only if influenza testing rates were 15-23% lower in the intervention vs. comparison site. ConclusionsSensitivity and bias analysis can shed light on why results of mathematical models and empirical epidemiologic studies differ for the same research question, and in turn, can improve study and model design.
Hall, L.; Chowell, G.
Show abstract
ObjectivesTo quantify the all-cause excess death rate of people living with HIV/AIDS (PWHA) during the multi-year 2020-2022 COVID-19 pandemic in the United States (U.S.), including stratifications by sex, age, race/ethnicity, and region. DesignUsing publicly available data from the CDC NCHHSTP AtlasPlus dashboard, we employed the ensemble n-subepidemic modeling framework (SubEpiPredict toolbox). This dynamic, uncertainty-aware approach was used to generate counterfactual forecasts of U.S. deaths among PWHA for 2020-2022. MethodsThe models were calibrated using 12 years of pre-pandemic mortality trends (2008-2019), with the median excess death rate calculated as the difference between forecasted and observed death rates. Results were stratified by age, sex, race/ethnicity, and U.S. region. ResultsOverall excess mortality among PWHA was estimated at 7,783 crude excess deaths (95% prediction interval [PI]: 5,098-10,525), corresponding to 2.77 excess deaths per 100,000 people (95% PI: 1.81-3.75), with the largest burden observed in 2021. Excess death rates were highest among males (3.39), individuals aged 55-64 years (4.94), multiracial populations (12.82), and residents of the Northeast U.S. (4.12). In contrast, the largest absolute number of excess deaths occurred among males (4,692), adults aged 65 years and older (2,560), Black/African American individuals (3,969), and residents of the Southern U.S. (4,025). ConclusionsThese systematic, model-based results reveal stark heterogeneities among PWHA by exposing recent mortality patterns that may not be captured by disease-specific mortality reporting alone. These heterogeneous findings can inform future public health programming and resource allocation and support tailored interventions for vulnerable populations.
Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.
Show abstract
Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.
Zhou, A.; Tian, H.; Patel, A.; Mason, A.; Yang, G.; Hypponen, E.; Burgess, S.
Show abstract
The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome relationships, and it did not introduce bias even when the effect modifier of the IV-exposure association was a confounder or was correlated with a mediator or collider of the exposure-outcome association. In empirical analyses of serum 25(OH)D, BMI, and LDL-C, falsification tests showed bias in the uncorrected doubly-ranked method. Under the selected panel of effect modifiers, the extent of bias attenuation achieved by GxE correction varied by exposures. GxE correction was most effective for LDL-C, with further support from analyses using negative controls (age at recruitment and sex) and coronary artery disease as a positive control. These findings provide proof of principle evidence that our proposed GxE correction strategy can mitigate GxE-induced bias in practice. Where applicable, we recommend implementing this GxE correction strategy as a sensitivity analysis to assess the robustness of findings from the doubly-ranked method.
Nethery, R. C.; Testa, C.; Tabb, L. P.; Hanage, W. P.; Chen, J. T.; Krieger, N.
Show abstract
Areal spatial misalignment, which occurs when data on multiple variables are collected using mismatched boundary definitions, is a ubiquitous obstacle to data analysis in public health and social science research. As one example, the emerging sub-field studying the links between political context and health in the United States faces significant spatial misalignment-related challenges, as the congressional districts (CDs) over which political metrics are measured and administrative units, e.g., counties, for which health data are typically released, have a complex misalignment structure. Standard population-weighted data realignment procedures can induce measurement error and invalidate inference, which has prompted the development of fully model-based approaches for analyzing spatially misaligned data. One such approach, atom-based regression models (ABRM), holds particular promise but has scarcely been used in practice due to the lack of appropriate software or examples of implementation. ABRM use "atoms", the areas created by intersecting all sets of units on which variables of interest are measured, as the units of analysis and build models for the atom-level data, treating the atom-level variables (generally unmeasured) as latent variables. In this paper, we demonstrate the feasibility and strengths of the ABRM in a case study of the association between political representatives voting behavior (CD-level) and COVID-19 mortality rates (county-level) in a post-vaccine period. The adjusted ABRM results suggest that more conservative voting record is associated with an increase in COVID-19 mortality rates, with estimated associations smaller in magnitude but consistent in direction with those of standard realignment methods. The results also indicate that ABRM may enable more robust confounding adjustment and more realistic uncertainty estimates, properly representing the uncertainties arising from all analytic procedures. We also implement the ABRM in modern optimized Bayesian computing programs and make our code publicly available, which may enable these methods to be more widely adopted.
Chan, L. Y. H.; Morris, S. E.; Stockwell, M. S.; Bowman, N. M.; Asturias, E.; Rao, S.; Lutrick, K.; Ellingson, K. D.; Nguyen, H. Q.; Maldonado, Y.; McLaren, S. H.; Sano, E.; Biddle, J. E.; Smith-Jeffcoat, S. E.; Biggerstaff, M.; Rolfes, M. A.; Talbot, H. K.; Grijalva, C. G.; Borchering, R. K.; Mellis, A. M.; RVTN-Sentinel Study Group,
Show abstract
BackgroundGeneration time, representing the interval between infection events in primary and secondary cases, is important for understanding disease transmission dynamics including predicting the effective reproduction number (Rt), which informs public health decisions. While previous estimates of SARS-CoV-2 generation times have been reported for early Omicron variants, there is a lack of data for subsequent sub-variants, such as XBB. MethodsWe estimated SARS-CoV-2 generation times using data from the Respiratory Virus Transmission Network - Sentinel (RVTN-S) household transmission study conducted across seven U.S. sites from December 2021 to May 2023. The study spanned three Omicron sub-periods dominated by the sub-variants BA.1/2, BA.4/5, and XBB. We employed a Susceptible-Exposed-Infectious-Recovered (SEIR) model with a Bayesian data augmentation method that imputes unobserved infection times of cases to estimate the generation time. FindingsThe estimated mean generation time for the overall Omicron period was 3.5 days (95% credible interval, CrI: 3.3-3.7). During the sub-periods, the estimated mean generation times were 3.8 days (95% CrI: 3.4-4.2) for BA.1/2, 3.5 days (95% CrI: 3.3-3.8) for BA.4/5, and 3.5 days (95% CrI: 3.1-3.9) for XBB. InterpretationOur study provides estimates of generation times for the Omicron variant, including the sub-variants BA.1/2, BA.4/5, and XBB. These up-to-date estimates specifically address the gap in knowledge regarding these sub-variants and are consistent with earlier studies. They enhance our understanding of SARS-CoV-2 transmission dynamics by aiding in the prediction of Rt, offering insights for improving COVID-19 modeling and public health strategies. FundingCenters for Disease Control and Prevention, and National Center for Advancing Translational Sciences.
Wang, J.; Ackley, S.; Chen, R.; Kezios, K.; Zeki Al Hazzouri, A.; Blacker, D.; Torres, J. M.; Glymour, M. M.
Show abstract
BackgroundThe long preclinical phase of dementia can bias estimated effects of baseline exposures on dementia incidence. We demonstrate simulations informed by reverse Mendelian randomization (MR) findings to quantify the age-specific magnitude of reverse causation bias in analyses in observational studies of the effects of body mass index (BMI) on dementia. MethodsWe simulated longitudinal trajectories of BMI and dementia risk from ages 45 to 90 years, calibrating to published evidence on age-specific dementia incidence, BMI, and associations of dementia genetic risk with BMI. Under the null that BMI does not influence dementia and an alternative that BMI at any age increases subsequent dementia risk, we simulated hypothetical cohort studies (n=20,000, average 15 years of follow-up), varying age of entry from 45 to 80 years. In each hypothetical cohort, the association of z-standardized BMI at study entry and dementia incidence were estimated using Cox proportional hazards models. Bias was quantified using the ratio of observed to true hazard ratios (RHRs). All scenarios were replicated 500 times. ResultsIn the absence of a causal effect of BMI on dementia, when follow-up began at age 65 years, the RHR was 0.91 (95% CI: 0.90-0.92). When follow-up began at age 80 years, the RHR decreased to 0.68 (95% CI: 0.67-0.69), indicating substantial bias attributable to reverse causation. ConclusionReverse causation, presumably arising from preclinical dementia, can induce substantial bias in estimates of the association between baseline exposures and dementia incidence. Simulations provide a convenient tool to quantify this bias.
Joshi, K.; Kahn, R.; Boyer, C.; Lipsitch, M.
Show abstract
BackgroundInfectious disease models, including individual based models (IBMs), can be used to inform public health response. For these models to be effective, accurate estimates of key parameters describing the natural history of infection and disease are needed. However, obtaining these parameter estimates from epidemiological studies is not always straightforward. We aim to 1) outline challenges to parameter estimation that arise due to common biases found in epidemiologic studies and 2) describe the conditions under which careful consideration in the design and analysis of the study could allow us to obtain a causal estimate of the parameter of interest. In this discussion we do not focus on issues of generalizability and transportability. MethodsUsing examples from the COVID-19 pandemic, we first identify different ways of parameterizing IBMs and describe ideal study designs to estimate these parameters. Given real-world limitations, we describe challenges in parameter estimation due to confounding and conditioning on a post-exposure observation. We then describe ideal study designs that can lead to unbiased parameter estimates. We finally discuss additional challenges in estimating progression probabilities and the consequences of these challenges. ResultsCausal estimation can only occur if we are able to accurately measure and control for all confounding variables that create non-causal associations between the exposure and outcome of interest, which is sometimes challenging given the nature of the variables we need to measure. In the absence of perfect control, non-causal parameter estimates should still be used, as sometimes they are the best available information we have. ConclusionsIdentifying which estimates from epidemiologic studies correspond to the quantities needed to parameterize disease models, and determining whether these parameters have causal interpretations, can inform future study designs and improve inferences from infectious disease models. Understanding the way in which biases can arise in parameter estimation can inform sensitivity analyses or help with interpretation of results if the magnitude and direction of the bias is understood.
Domingo-Relloso, A.; Jerolon, A.; Tellez-Plaza, M.; Bermudez, J. D.
Show abstract
ObjectiveThe study of the potential intermediate effect of several variables on the association between an exposure and a time-to-event outcome is a question of interest in epidemiologic research. However, to our knowledge, no tools have been developed for the evaluation of multiple correlated mediators in a survival setting. MethodsIn this work, we extended the multimediate algorithm, which conducts mediation analysis in the context of multiple uncausally correlated mediators, to a time-to-event setting using the semiparametric additive hazards model. We theoretically demonstrated that, under certain assumptions, indirect, direct and total effects can be calculated using the counterfactual framework with collapsible survival models. We also adapted the algorithm to accommodate exposure-mediator interactions. Results and conclusionsUsing simulations, we demonstrated that our algorithm performs better than the product of coefficients method, even for uncorrelated mediators. The additive hazards model quantifies the effects as rate differences, which constitute a measure of impact, with applications that can be highly informative for public health. Our algorithm can be found in the R package multimediate, which is available in Github.
Lankester, J.; Guarischi-Sousa, R.; Hilliard, A. T.; VA Million Veteran Program, ; Shere, L.; Husary, M.; Crowe, S.; Tsao, P. S.; Rehkopf, D. H.; Assimes, T. L.
Show abstract
BackgroundBreastfeeding has established health benefits for infants and has been associated with postpartum improved maternal cardiometabolic health in the long term. However, breastfeeding prevalence is also inversely associated with prepartum body mass index (BMI), and both are linked to socioeconomic factors. We sought to clarify the relationship between prepartum BMI and breastfeeding prevalence in the Million Veteran Program (MVP), a large-scale genetic epidemiology study of US Veterans. MethodsWe included data from parous female participants with available breastfeeding information from the MVP cohort. BMI at enrollment as well as earliest BMI available were extracted from the electronic health record, and polygenic scores (PGS) for BMI were calculated for the subset of participants with genotype data. We modeled whether participants breastfed an infant for one month or more (BF[≥]1M) as a function of BMI at enrollment (n=20,293); earliest BMI where available pre-pregnancy (n=532); and PGS for BMI among genetically inferred European ancestry participants (n=11,568). We conducted Mendelian randomization for breastfeeding using PGS as an instrumental variable. ResultsA higher BMI predicted a lower likelihood of BF[≥]1M in all analyses. A +5 kg/m2 BMI pre-pregnancy was associated with a 24% reduced odds of BF[≥]1M, and a +5 kg/m2 genetically predicted BMI was associated with a 17% reduced odds of BF[≥]1M. ConclusionsBMI predicts a lower likelihood of BF[≥]1M. Given the high success of breastfeeding initiation in supportive environments combined with potential health benefits to both infant and mother, pregnant Veterans with prepartum elevated BMI may benefit from additional postpartum breastfeeding support.